This project is about classifying whether or not patient has he
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
df=pd.read_csv('Virat_Kohli_ODI.csv')
print(df)
Runs Mins BF 4s 6s SR Pos Dismissal Inns Opposition \
0 12 33 22 1 0 54.54 2 lbw 1 v Sri Lanka
1 37 82 67 6 0 55.22 2 caught 2 v Sri Lanka
2 25 40 38 4 0 65.78 1 run out 1 v Sri Lanka
3 54 87 66 7 0 81.81 1 bowled 1 v Sri Lanka
4 31 45 46 3 1 67.39 1 lbw 2 v Sri Lanka
.. ... ... ... .. .. ... ... ... ... ...
127 45 64 51 2 1 88.23 3 caught 2 v New Zealand
128 65 152 76 2 1 85.52 3 caught 1 v New Zealand
129 122 147 105 8 5 116.19 3 caught 2 v England
130 8 6 5 2 0 160 3 caught 1 v England
131 55 81 63 8 0 87.3 3 caught 2 v England
Ground Start Date
0 Dambulla 18-Aug-08
1 Dambulla 20-Aug-08
2 Colombo (RPS) 24-Aug-08
3 Colombo (RPS) 27-Aug-08
4 Colombo (RPS) 29-Aug-08
.. ... ...
127 Ranchi 26-Oct-16
128 Visakhapatnam 29-Oct-16
129 Pune 15-Jan-17
130 Cuttack 19-Jan-17
131 Kolkata 22-Jan-17
[132 rows x 12 columns]
data=pd.read_csv('Virat_Kohli_ODI.csv')
print(data)
Runs Mins BF 4s 6s SR Pos Dismissal Inns Opposition \
0 12 33 22 1 0 54.54 2 lbw 1 v Sri Lanka
1 37 82 67 6 0 55.22 2 caught 2 v Sri Lanka
2 25 40 38 4 0 65.78 1 run out 1 v Sri Lanka
3 54 87 66 7 0 81.81 1 bowled 1 v Sri Lanka
4 31 45 46 3 1 67.39 1 lbw 2 v Sri Lanka
.. ... ... ... .. .. ... ... ... ... ...
127 45 64 51 2 1 88.23 3 caught 2 v New Zealand
128 65 152 76 2 1 85.52 3 caught 1 v New Zealand
129 122 147 105 8 5 116.19 3 caught 2 v England
130 8 6 5 2 0 160 3 caught 1 v England
131 55 81 63 8 0 87.3 3 caught 2 v England
Ground Start Date
0 Dambulla 18-Aug-08
1 Dambulla 20-Aug-08
2 Colombo (RPS) 24-Aug-08
3 Colombo (RPS) 27-Aug-08
4 Colombo (RPS) 29-Aug-08
.. ... ...
127 Ranchi 26-Oct-16
128 Visakhapatnam 29-Oct-16
129 Pune 15-Jan-17
130 Cuttack 19-Jan-17
131 Kolkata 22-Jan-17
[132 rows x 12 columns]
data["Runs"] = data["Runs"].str.replace("*", "")
data["Runs"]
/var/folders/01/jggr45n103z6_1t9mbhr0m040000gn/T/ipykernel_1957/48377316.py:1: FutureWarning: The default value of regex will change from True to False in a future version. In addition, single character regular expressions will *not* be treated as literal strings when regex=True.
data["Runs"] = data["Runs"].str.replace("*", "")
0 12
1 37
2 25
3 54
4 31
...
127 45
128 65
129 122
130 8
131 55
Name: Runs, Length: 132, dtype: object
data["Runs"] = data["Runs"].astype(int)
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 132 entries, 0 to 131 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Runs 132 non-null int64 1 Mins 132 non-null object 2 BF 132 non-null int64 3 4s 132 non-null int64 4 6s 132 non-null int64 5 SR 132 non-null object 6 Pos 132 non-null int64 7 Dismissal 132 non-null object 8 Inns 132 non-null int64 9 Opposition 132 non-null object 10 Ground 132 non-null object 11 Start Date 132 non-null object dtypes: int64(6), object(6) memory usage: 12.5+ KB
total_runs = data["Runs"].sum()
total_runs
6184
normally in ODI's 35-37 is considered a good average
data['Runs'].mean()
46.84848484848485
Now let us look at the trend of runs scored by him
matches=data.index
figure=px.line(data,x=matches,y='Runs',title='Runs scored by virat kohli between 18-Aug-08 - 22-Jan-17',template='plotly_dark')
figure.show()
In some matches kohli has scored more than hundred or near to century so based on his batting positions analysing his performance
data['Pos']=data['Pos'].map({3.0: "Batting At 3", 4.0: "Batting At 4", 2.0: "Batting At 2",
1.0: "Batting At 1", 7.0:"Batting At 7", 5.0:"Batting At 5",
6.0: "batting At 6"})
Pos=data["Pos"].value_counts()
label = Pos.index
counts = Pos.values
colors = ["gold","lightgreen", "pink", "blue", "skyblue", "cyan", "orange"]
fig = go.Figure(data=[go.Pie(labels=label, values=counts)])
fig.update_layout(title_text='Number of Matches At Different Batting Positions')
fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=30,
marker=dict(colors=colors, line=dict(color='black', width=3)))
fig.show()
68.9% of all innings played by virat kohli he batted in third position. now lets calculate total runs scored by virat kohli in different positions
label = data["Pos"]
counts = data["Runs"]
colors = ['gold','lightgreen', "pink", "blue", "skyblue", "cyan", "orange"]
fig = go.Figure(data=[go.Pie(labels=label, values=counts)])
fig.update_layout(title_text='Runs By Virat Kohli At Different Batting Positions')
fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=30,
marker=dict(colors=colors, line=dict(color='black', width=3)))
fig.show()
72.4% of his runs are scored at batting position 3. From this we can 3rd position is best for him
Now let us look at the centuries scored by virat kohli in second and first innings
centuries = data.query("Runs >= 100")
figure = px.bar(centuries, x=centuries["Inns"], y = centuries["Runs"],
color = centuries["Runs"],
title="Centuries By Virat Kohli in First Innings Vs. Second Innings")
figure.show()
centuries are scored while batting in the second innings.
Dismissals virat kohli faced
dismissal = data["Dismissal"].value_counts()
label = dismissal.index
counts = dismissal.values
colors = ['gold','lightgreen', "pink", "blue", "skyblue", "cyan", "orange"]
fig = go.Figure(data=[go.Pie(labels=label, values=counts)])
fig.update_layout(title_text='Dismissals of Virat Kohli')
fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=30,
marker=dict(colors=colors, line=dict(color='black', width=3)))
fig.show()
Kohli wicket is mostly because of the fielder or the keeper. Let us understand against which team kohli scored most of his runs
figure = px.bar(data, x=data["Opposition"], y = data["Runs"], color = data["Runs"],
title="Most Runs Against Teams")
figure.show()
Virat Kohli likes batting against Sri Lanka, Australia, New Zealand, West Indies, and England.But he scored most of his runs while batting against Sri Lanka
Now let’s have a look at against which team Virat Kohli scored most of his centuries
figure=px.bar(centuries,x=centuries['Opposition'],y=centuries['Runs'],color = centuries["Runs"],
title="Most Centuries Against Teams")
figure.show()
Most of the centuries were scored against Australia.
Taking strike rate into consideration we need to create a new dataset of all the matches played by virat kohli where his strike rate is more than 120.
data["SR"] = data["SR"].str.replace("-", " ")
print(data)
Runs Mins BF 4s 6s SR Pos Dismissal Inns \
0 12 33 22 1 0 54.54 Batting At 2 lbw 1
1 37 82 67 6 0 55.22 Batting At 2 caught 2
2 25 40 38 4 0 65.78 Batting At 1 run out 1
3 54 87 66 7 0 81.81 Batting At 1 bowled 1
4 31 45 46 3 1 67.39 Batting At 1 lbw 2
.. ... ... ... .. .. ... ... ... ...
127 45 64 51 2 1 88.23 Batting At 3 caught 2
128 65 152 76 2 1 85.52 Batting At 3 caught 1
129 122 147 105 8 5 116.19 Batting At 3 caught 2
130 8 6 5 2 0 160 Batting At 3 caught 1
131 55 81 63 8 0 87.3 Batting At 3 caught 2
Opposition Ground Start Date
0 v Sri Lanka Dambulla 18-Aug-08
1 v Sri Lanka Dambulla 20-Aug-08
2 v Sri Lanka Colombo (RPS) 24-Aug-08
3 v Sri Lanka Colombo (RPS) 27-Aug-08
4 v Sri Lanka Colombo (RPS) 29-Aug-08
.. ... ... ...
127 v New Zealand Ranchi 26-Oct-16
128 v New Zealand Visakhapatnam 29-Oct-16
129 v England Pune 15-Jan-17
130 v England Cuttack 19-Jan-17
131 v England Kolkata 22-Jan-17
[132 rows x 12 columns]
figure = px.bar(data, x = data["Inns"],
y = data["SR"],
color = data["SR"],
title="Virat Kohli's High Strike Rates in First Innings Vs. Second Innings")
figure.show()
figure = px.scatter(data_frame = data, x="Runs",
y="4s",
title="Relationship Between Runs Scored and Fours")
figure.show()
figure = px.scatter(data_frame = data, x="Runs",
y="6s",
title= "Relationship Between Runs Scored and Sixes")
figure.show()